On Superlinear Lower Bounds in Complexity Theory

نویسنده

  • Kenneth W. Regan
چکیده

This paper first surveys the near-total lack of superlinear lower bounds in complexity theory, for “natural” computational problems with respect to many models of computation. We note that the dividing line between models where such bounds are known and those where none are known comes when the model allows non-local communication with memory at unit cost . We study a model that imposes a “fair cost” for non-local communication, and obtain modest superlinear lower bounds for some problems via a Kolmogorov-complexity argument. Then we look to the larger picture of what it will take to prove really striking lower bounds, and pull from ours and others’ work a concept of information vicinity that may offer new tools and modes of analysis to a young field that rather lacks them. 1 The Problem of Superlinear Lower Bounds When the subject of the NP-complete problems comes up, people think about the NP 6=? P question: whether they have super-polynomial time lower bounds. But the gap in our knowledge is much wider than that: for all but a scant few of these problems, there is currently no super-linear time lower bound, not even for a deterministic Turing machine (DTM) with two tapes. All of the twenty-one problems in Karp’s foundational paper [Kar72] extending Cook’s Theorem [Coo71] belong to nondeterministic TM linear time (NLIN) under reasonable encoding schemes, but none—not SAT , not Clique, nor Hamiltonian Path nor Subset Sum, has been proved to lie outside DTM linear time (DLIN). We do have the theorem of Paul, Pippenger, Sze∗This work was supported in part by NSF Grant CCR9409104. Author’s current address: Computer Science Department, 226 Bell Hall, UB North Campus, Buffalo, NY 142602000. Email: [email protected], tel.: (716) 645–3180x114, fax: (716) 645–3464. merédi and Trotter [PPST83] that NLIN 6= DLIN. When unwound, the proof of this theorem provides a lower bound of Ω(n ·(log∗ n)) on the running time of any deterministic TM that accepts a certain language in NLIN (whose construction goes through Σ4alternating TMs). This barely-superlinear bound also applies to all languages that are complete for NLIN under DLIN many-one reductions (≤ m ). That such languages exist is not a simple matter of defining L = {〈N,x, 0〉 : the NTM N accepts x within m steps}, because the NTMs N may have any number of tapes. However, Book and Greibach [BG70] showed that every language in NLIN is accepted in real time (i.e., time n + 1) by some NTM N ′ with two worktapes. The alphabet of N ′ may be arbitrarily large, but at the cost of losing the real time, we can replace N ′ by a linear-time N ′′ with work alphabet { 0, 1 }. Now define L′ := {〈N ′′, x, 0m·|N |〉 : N ′′ accepts x in m steps, where N ′′ has the form above}. Then L′ is NLIN-hard under ≤ m , and also L′ ∈ NLIN. However, the most efficient versions of Cook’s Theorem known to date (see [Rob91, BG93, Sch78]) transform a time-t(n) NTM N into formulas that have O(t(n) log t(n)) variables, so while SAT belongs to NLIN it may not be NLIN-hard. Grandjean [Gra88, Gra90b] proved that a few NP-complete problems are NLIN-hard under ≤ m , including just one listed in [GJ79], called “Reduction to Incompletely Specified Automaton”: Given a DFA M with some arcs marked, and an integer k, is there a way to redefine the marked arcs to obtain a DFA M ′ such that M ′ is equivalent to a DFA M ′′ that has at most k states? None of the problems identified by Grandjean is known to belong to NLIN, but at least they definitely do not belong to DTIME[o(n(log∗ n))]. Still, if one moves to a machine model that is slightly richer than the standard TM, even the lower bound of [PPST83] goes away. Deterministic TMs with planar tapes may still accept NLIN-complete languages in linear time. The linear-time classes DTIME[O(n)] for TMs with d-dimensional tapes (dTMs) form a hierarchy that many suspect to be proper. To simulate a linear-time d-TM M by a standard TM M ′, the best known time is O(n2−1/d). If one requires that the simulation be on-line, meaning in general that every t steps of M are simulated by t′ corresponding steps of M ′, then a lower bound that matches this upper bound was proved early on by Hennie [Hen66]. If M is a tree-computer (TC); i.e., a TM with binary tree-structured tapes, then this time is O(n), again with a matching lower-bound for the on-line case [Hen66] (see also [PR81, Lou81, Lou83, Lou84, LL92]). However, this does not prove that the language classes DTIME[O(n)] and TC-TIME[O(n)] are distinct from DLIN or from each other. Moreover, none of these classes above DLIN is known to differ from its nondeterministic counterpart. The upshot is that DLIN may be as much as quadratically weaker than these other reasonable notions of linear time, making our inability to prove bounds against DLIN for most NP-complete problems all the more flabbergasting. One can, of course, construct languages and functions that are not in these linear time classes by diagonalization. But these methods are intuitively “artificial.” Adachi and Iwata [AI84] (see also [KAI79]) proved Ω(n) lower bounds on certain pebble games in P, but these are tied closely to TM simulation and diagonalization. What we are most interested in, besides the major NP-complete problems, are “natural” computational tasks of a simpler kind: sorting, finding elements with duplicates on a list, arithmetic in finite fields, Fast Fourier Transform, matrix transpose, matrix multiplication, to name a few. All but the last belong to DTIME[n log n]; the best known time to multiply two n× n integer or Boolean matrices is n [CW90], which gives time N when N = n is regarded as the input length. The first three have been the focus of several efforts to prove super-linear lower bounds on TMs; these efforts have been neatly summarized by Mansour, Nisan, and Tiwari [MNT93], and their technique is a major topic below. The two simplest languages that have attracted similar efforts are 1S. Buss [Bus94] has recently proved that the language {〈φ, n〉 : φ has a proof in first-order logic that is less than n symbols long} requires time Ω(2N ) infinitely often on a DTM, and time Ω(2N/N) on an NTM. Here n is written in binary and the result holds even when φ is restricted to have length N = O(logn) when encoded over a finite alphabet. But this is up at the level of complete sets for (nondeterministic) exponential time. Buss also shows that when n is written in unary and φ can have size O(n), the problem is NP-complete. His techniques hold some promise for progress on whether SAT is NLIN-complete. the language of lists with no duplicate elements, and the language of undirected graphs that have a triangle. The former can be solved with one call to sorting, but the best known time to solve the latter (even on a unit-cost RAM) is N by calculating A + A, where A is the adjacency matrix of the graph. Note that the famous Ω(n log n) lower bound on sorting applies to a model where the only operation allowed on numbers is to compare them. The lower bound is preserved when numbers can also be added, subtracted, and multiplied [PS84], but is not known when division or bitwise Boolean operations (at logcost; i.e., per-op cost proportional to the bit-length of the numbers) are allowed. Allow a TM to get its mitts on the bits on a list of m r-bit numbers (say r ≈ 2 logn), and no one has shown that the TM can’t sort them in O(n) time, where n = mr is the true bit-length of the list. For more in this line, see [FW90, FW93]. Aggarwal and Vitter [AV88] called the task of extending their lower bounds to models that “allow arbitrary bit-manipulations and dissections of records” a “challenging open problem.” They do not offer such a model, and the related work of [AACS87, ACS87, AC88, ACF90, ACS90, Vit91, VN92] still treats integers and records as indivisible units. The sequential-time lower bounds in the lastmentioned papers are “modest,” by which we mean Ω(n log n) or Ω(n loglogn) and the like. Let us call a bound of Ω(n ), for some fixed > 0, strong . A strong lower bound puts a problem out of reach of the quasilinear time class DQL = DTIME[qlin] for TMs, where qlin = n · (log n). Schnorr [Sch78] proved (as mentioned above) that SAT is complete for nondeterministic qlin time (NQL) under DQLreductions, and the catalogued results of Dewdney [Dew81, Dew82, Dew89] extend this to many other problems in [GJ79]. Time qlin on the TC may still be quadratically more powerful than DQL in the above sense. However, it is “robust” insofar as it equals time qlin on a wide variety of “reasonable RAM” models: the log-cost RAM of Cook and Reckhow [CR73], the successor RAM (SRAM) and its relatives (see [WW86]), the random-access TMs of Gurevich and Shelah [GS89] (the hub paper for the robustness), the “pointer machines” of Schönhage [Sch80, Sch88], and the models of Grandjean and Robson [GR91], Jones [Jon93], and Grandjean [Gra93, Gra94b, Gra94a]. A strong lower bound against these models also puts a problem out of reach of the “∩ >0time n ” class of Graedel [Gra90a], which is robust for these RAMs and also for TMs that can have tapes of arbitrary dimension. But even modest lower bounds are hard enough to get, enough to prompt some writers to say that the lowly standard TM is “intractable to analyze for lower bounds.” In search of lower bounds, researchers have studied models that are weaker than the standard TM. Among several sources of super-linear lower bounds for these models, we mention [DGPR84, MS86, LV88, DMS91, LLV92, DM93]. The surveys [WW86, vEB90] give a window onto the vast literature of machine models of all powers. In an attempt to find a unifying principle that divides (1) all the models for which such lower bounds are currently known from (2) those on which we’ve come up empty (or fourth-root of log-star from empty), we offer the following observation: Does your model permit communication between remote parts of memory at unit cost? Does it operate on bits? Then it is in category (2). For example, the tight lower bounds of n and n (up to some log factors) for matrix transpose and sorting proved in [DMS91, DM93] apply to TMs with only one worktape. A second worktape, or a second head on the one tape, allows remote communication at unit cost and blows away the bound. (See also [FMR72, Kos79, LS81, JL93, JSV94].) VLSI and systolic models by definition allow only local communication, and there has been much success on lower bounds and area-time tradeoffs for them (see [MC80, CM85, Gru90, HKMW92]). The approach proposed here is a logical response to the above observation: Let us allow remote communication, but charge a “fair cost” for it, and then study the effect of this charge on the running time. Section 2 describes a machine model, called Block Move, which carries out this motivation, and which binds practical elements such as latency , pipelining , and stream transductions that the older models lack. Section 3 proves “modest” lower bounds in this model for some string-editing and permutation problems. Section 4 reviews the Ω(n) time-space tradeoff lemmas for functional branching programs (BPs) due to Mansour, Nisan, and Tiwari [MNT93]. The general idea is that if time t on your model translates to time-space o(t) for BPs, then any function with the tradeoff—this includes sorting and finite-field arithmetic [MNT93]—has a superlinear time lower bound. We point out that these lemmas apply also to nondeterministic BPs computing multivalued functions in the “NPMV” sense of Selman (see [Sel94]). Section 5 shows how these upgraded lemmas combine with the Kolmogorov-complexity argument of section 3 to yield a lower-bound technique; however, this leads to interesting combinatorial problems about BPs that we have so far been unable to solve. Finally, Section 6 raises the greater goal of proving strong lower bounds. In considering what of greater general value can be learned from Block Move and the “modest” lower bounds, we offer a notion of information vicinity that extends a recent treatment by Feldman and Shapiro [FS92], and that at least attempts to get beyond the notorious relativization results that have cast a pall on many efforts to prove strong lower bounds. Sources and Acknowledgements. Sections 1 and 2 supplement my earlier survey article [Reg93]. The theorem in Section 3, which solves a problem left open in [Reg93], appeared with only a “Proof Sketch” in [Reg94a] owing to a 6-page limit; here we give more details. The modification of [MNT93] in Section 4 and everything that comes afterward is entirely new. I am grateful to all who have given me comments on those earlier papers, and in particular to Etienne Grandjean and Sam Buss for discussions of their recent work. 2 String Editing and Block Moves The main idea of the Block Move model can be expressed as a one-person game in which the Player (P ) edits a tape that stands for a long sequential file. Let the tape initially hold a string w over { 0, 1 } in cells 0 through n−1, where n = |w|, and let P have at her disposal a finite work alphabet Γ that includes { 0, 1 } and the blank B. The Player is given a goal string x, and seeks the least costly way, starting from w, to produce a tape whose first |x| cells hold x. The cost of each editing move by P is calibrated by a function μ : N→ N that grades the communication time with memory. The intent is that low-numbered cells are like a fast-memory cache, while high locations figuratively reside on a large but slow disk drive. For instance, if P changes the single character in some cell e, the cost is μ(e) time units. The principal μ functions studied here, as earlier in the Block Transfer model of [ACS87] (which is integer-based rather than bit-based, and in other respects weaker than ours), are defined by μd(e) = dee, where d ≥ 1 is figuratively the dimension of the memory. The distinctive idea of the game is that if several edits of a similar kind can be done in one block [a . . . b] of consecutive locations in the file, then P should profit from this spatial locality by being charged μ(a) or μ(b) only for the initial access, and unit time per edit thereafter. The notion of “similar kind” is that the edits can be done in one stream that is pipelined through a finite-state machine. Many commands in the Unix(R) stream editor sed are representable this way. The particular model of finite-state machine we use is the deterministic generalized sequential machine (DGSM), as formalized in [HU79] or [HKL92]. The Player P has some finite set S of DGSMs available to her. Rule 1. In any move, P may mark locations a, b, c, d, and select a DGSM S. Let z be the string held in locations [a . . . b]. Then S(z) is written to locations [c . . . d]. The μ-time for the move is |z|+μ(a′), where a′ = max{ a, b, c, d }. We require that the intervals [a . . . b] and [c . . . d] be disjoint, and that the output S(z) exactly fills the target block. One can have a < b and/or c < d; in particular, substrings can be reversed on the tape. S(z) overwrites any previous content of the target block [c . . . , d], except for the following provision: Rule 2. The blank B may be an output character of GSMs, and every B in the output stream S(z) leaves the previous symbol in its target cell in [a2 . . . b2] unchanged. This rule is proved in [Reg94b] to have the same effect as making the write of S(z) completely destructive, but adding an instruction that shuffles two equal-sized blocks into a third. Having Rule 2 enables us to describe all actions by P as a sequence of block moves from Rule 1, viz.: S1[a1 . . . b1] into [c1 . . . d1] S2[a2 . . . b2] into [c2 . . . d2] .. SR[aR . . . bR] into [cR . . . dR], where S1, . . . , SR belong to the fixed finite set S of DGSMs at P ’s disposal. This editor does not have an insert/delete mode or allow the familiar form of “cut and paste” where the file is joined together after the cut. The realistic file-system model of Willard [Wil92] has similar restrictions. Note that changing a single character in a cell a′ is subsumed by a block move with c = d = a′. The μ-time of the program is the sum of the μ-times of the block moves. The above defines a natural straight-line program (SLP) model, similar in form to other non-uniform SLP models based on machines, on arithmetical formulas (see e.g. [BF90]), on Boolean circuits (see e.g. [Wig93]), or on bounded-width branching programs (see [BT88, Bar89, BIS90, BS91]). To build a uniform machine around the above, we need to specify control structures and appropriate “hardware.” The main results of [Reg94b] show that under any of the μd cost functions, the model is linear-time robust under just about any choice borrowed from familiar machines: A form with just one (large) DGSM and one tape can simulate a form with any finite number k of tapes and k-input DGSMs, or a form with random-access addressing of the tapes, with only a constant-factor time slowdown. The above restrictions on overlap and size of the target block can be enforced or waived; it makes no difference. For definiteness, we use a form with four “fingers” and some finite number (8 is enough [Reg94b]) of “markers,” initially placed on cells 0 and n−1, such that after every block move: (1) each marker on a cell i may be moved to cell bi/2c, 2i, 2i+1 at a charge of μ(i)—or left where it is at no charge, (2) the fingers “a,b,c,d” for the next move are (re-)assigned to markers, and (3) control branches according to the character now scanned by finger “a.” In the SLP form, of course, we don’t have to worry about control or which a, b, c, d can legally follow the previous move, and we can have a separate program P for each input length n—or even for each individual w of length n. It is nice that the relevant robustness results carry over, particularly that a one-tape Player can simulate a k-tape Player in linear time. Next we establish lower bounds on certain “non-uniform” problems for the SLPs, with an eye toward using them as ingredients for lower bounds on the natural problems described in Section 1, for the uniform machines. 3 A Kolmogorov Lower Bound Argument Given two strings w and x of the same length n, define their edit distance Eμ(w, x) to be the least t such that the Player can change w to x in μ-time t. Define eμ(n) := max{Eμ(w, x) : |w| = |x| = n }. For all d ≥ 1, let Ed and ed stand for the above functions under μd. The idea of the lower bounds is that the time for a block move under μd is asymptotically greater than the number of bits required to write the move down. The latter is bounded above by C + 4 log2(a), where C is a constant that depends only on the size of the fixed S, and a′ is the maximum address involved in the move. Indeed, the lower bounds work for any finite set of operations, not just DGSMs, and ignore the time to read and write the addressed blocks after the initial μd(a) access charge The matching upper bounds require only that S contain the DGSM copy and the two DGSMs S0 and S1, which run for one step only and write a single 0 or 1. This we tacitly assume in stating: Theorem 3.1 ([Reg94a]) For any fixed S, e1(n) = Θ(n log n), and for all d > 1, ed(n) = Θ(n loglogn). Proof. For the upper bounds, it suffices to bound Ed(0, x) for all x of length n. In the case d = 1, we may suppose n = 2. The procedure is: 1. Generate the right half of x in cells 0 . . . 2k−1− 1, 2. copy [0 . . . 2k−1 − 1] into [2k−1 . . . 2 − 1], 3. Generate the left half of x in cells 0 . . . 2k−1 − 1. The basis is writing 0 or 1 to cell 0, and the μ1-time taken is O(n log n). Note, moreover, that the moves are oblivious, insofar as every x of length n uses the same sequence of address 4-tuples (ai, bi, ci, di). For integral d > 1, the recursion works on the intervals from nk−1 = 2 k−1 to nk = 2 k , chosen so that nk/nk−1 = nd−1 k−1. For each j, 1 ≤ j < nk/nk−1, it recursively generates the required substring in the first nk−1 cells and executes copy [0 . . . nk−1 − 1] into [jnk−1 . . . (j + 1)nk−1 − 1]. The charges μ(a) for these copy steps are bounded by D·nk, where D depends only on d. This gives the recursion T (nk) = (nk−1)T (nk−1) + O(nk), with solution T (n) = O(n loglogn). This much is similar to the upper bound methods for the“Touch Problem” in [ACS87]. For non-integral but rational d > 1, the necessary lemmas for computing interval endpoints efficiently may be found in section 4.2 of [Reg94b]. For the lower bounds, we give full detail for the case d = 1, and a start on the argument for d > 1. Let g(n) be such that for all but finitely many n, e1(n) ≤ ng(n). We will show that g(n) must be Ω(log n). Let n be given, and let k = dlog2 ne. Now let x be any string such that the conditional Kolmogorov complexity K(x|0) is at least n (see [LV93]). Let P be an SLP that consists of the sequence of moves used to generate x in μ1-time ng(n). Note that P itself is a description of x. We will convert P into a “modified” program P ′′ that generates x from 0, and is such that if g(n) = o(log n), then P ′′ has length o(n), contradicting the choice of x. For each i, 1 ≤ i ≤ k, call the tape interval [2i−1 . . . 2 − 1] “region i.” Cell 0 itself forms “region 0.” The portion of the tape from cell 2 onward is also usable by P ; for the purpose of the proof, it is enough to consider it also part of region k. Say that a block move (or marker movement) is “charged in region i” if the memory-access charge μ(a′) recorded for the move is for some cell a′ in region i. Note that any move charged in region i is independent of any information in regions i + 1 and above. To simplify some calculations, without affecting μ1 time by more than a factor of 2, we suppose that the charge for region i is exactly 2, and regard n as equal to 2. Now we make an important observation that embodies the connection between μ1 and the choice of regions. Claim. P can be modified to an equivalent program P ′ such that for each step charged in some region i, the next step is charged in region i−1, i, or i+ 1, and the μ1-time of P ′ is at most 3 times the μ1 time of P . The proof of this is straightforward: if P wants to jump from region i to region j, let P ′ make dummy moves in the regions in-between. Now we refer to P ′ and ignore the constant 3. For each i, 1 ≤ i ≤ k, define N(i) to be the number of steps in P ′ charged in region i. Then for some i, 2N(i) ≤ ng(n)/k. (1) Choose the greatest such i. We have N(i) ≤ 2k−ig(n)/k, and rewriting (1) another way, N(i) ≤ ng(n)/2 log n. Also since N(i) ≥ 1, i ≤ k − log(log(n)/g(n)). By the choice of i, we have that for each j > i, N(j) > 2k−jg(n)/k. Hence at least 2k−ig(n)/k moves are charged above region i. Call these 2k−ig(n)/k moves “required” moves. The total μ1-time charged for these required moves is at least (k − i)2g(n)/k. Since n = 2 and the total μ1 time is assumed to be ng(n), the total μ1-time available for all other moves is at most B = ng(n)(1− (k − i)/k) = ng(n)i/k. By the adjacency condition imposed on P ′, all the moves charged in regions i and above fall into at most N(i) segments of the program P ′, which we refer to as “high segments.” Now the following is a description of x: For each high segment l, 1 ≤ l ≤ N(i), give (a) The contents wl of cells [0 . . . 2i−1−1] prior to the first move of the segment, and (b) The instructions executed by P ′ in that segment. Finally, after the last high segment, append the first 2i−1 bits of x. This finishes P ′′. Per remarks before Theorem 3.1, each block move instruction charged in region j can be written down using Cj bits, where the constant C depends only on S. Now we can place upper bounds on the length of the description: • For intervals between high segments: at most C02 · (N(i) + 1) < C0ng(n)/(log n) bits, where C0 depends on the work alphabet size. • For the required high moves: at most ∑k j=i+1 2 k−jCjg(n)/(log n) bits, which is bounded above by (g(n)/ log n)·C ·(log2 e)2i2k−i. • For moves charged in region i, at most N(i)Ci ≤ (g(n)/ log n)·Ci2k−i. • For the “other” moves charged in regions j > i, let N ′(j) = N(j) − 2k−jg(n)/ log n. For the bound we need to maximize

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superlinear Lower Bounds for Bounded-Width Branching Programs

We use algebraic techniques to obtain superlinear lower bounds on the size of bounded-width branching programs to solve a number of problems. In particular, we show that any bounded-width branching program computing a nonconstant threshold function has length (n log log n); improving on the previous lower bounds known to apply to all such threshold functions. We also show that any program over ...

متن کامل

Cancellation-free circuits: An approach for proving superlinear lower bounds for linear Boolean operators

We continue to study the notion of cancellation-free linear circuits. We show that every matrix can be computed by a cancellationfree circuit, and almost all of these are at most a constant factor larger than the optimum linear circuit that computes the matrix. It appears to be easier to prove statements about the structure of cancellation-free linear circuits than for linear circuits in genera...

متن کامل

Remarks on Graph Complexity

We revisit the notion of graph complexity introduced by Pudll ak, RR odl, and Savick y PRS]. Using their framework, we show that suuciently strong superlinear monotone lower bounds for the very special class of 2-slice functions would imply superpolynomial lower bounds for some other functions. Given an n-vertex graph G, the corresponding 2-slice function fG on n variables evaluates to zero on ...

متن کامل

Average-Case Lower Bounds and Satisfiability Algorithms for Small Threshold Circuits

We show average-case lower bounds for explicit Boolean functions against bounded-depth thresh-old circuits with a superlinear number of wires. We show that for each integer d > 1, there isεd > 0 such that Parity has correlation at most 1/nΩ(1) with depth-d threshold circuits whichhave at most n1+εd wires, and the Generalized Andreev Function has correlation at most 1/2nwith ...

متن کامل

Geometric complexity theory and matrix powering

Valiant’s famous determinant versus permanent problem is the flagship problem in algebraic complexity theory. Mulmuley and Sohoni (Siam J Comput 2001, 2008) introduced geometric complexity theory, an approach to study this and related problems via algebraic geometry and representation theory. Their approach works by multiplying the permanent polynomial with a high power of a linear form (a proc...

متن کامل

The Minrank of Random Graphs

The minrank of a graph G is the minimum rank of a matrix M that can be obtained from the adjacency matrix of G by switching ones to zeros (i.e., deleting edges) and setting all diagonal entries to one. This quantity is closely related to the fundamental information-theoretic problems of (linear) index coding (Bar-Yossef et al., FOCS’06), network coding and distributed storage, and to Valiant’s ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995